Event-Learning with a Non-Markovian Controller
نویسندگان
چکیده
Recently a novel reinforcement learning algorithm called event-learning or E-learning was introduced. The algorithm based on events, which are defined as ordered pairs of states. In this setting, the agent optimizes the selection of desired sub-goals by a traditional value-policy function iteration, and utilizes a separated algorithm called the controller to achieve these goals. The advantage of event-learning lies in its potential in non-stationary environments, where the near-optimality of the value iteration is guaranteed by the generalized ε-stationary MDP model. Using a particular nonMarkovian controller, the SDS controller, an ε-MDP problem arises in E-learning. We illustrate the properties of E-learning augmented by the SDS controller by computer simulations.
منابع مشابه
Reinforcement Learning in Markovian and Non-Markovian Environments
This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...
متن کاملNoisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization
We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental resul...
متن کاملAuxiliary Gibbs Sampling for Inference in Piecewise-Constant Conditional Intensity Models
A piecewise-constant conditional intensity model (PCIM) is a non-Markovian model of temporal stochastic dependencies in continuoustime event streams. It allows efficient learning and forecasting given complete trajectories. However, no general inference algorithm has been developed for PCIMs. We propose an effective and efficient auxiliary Gibbs sampler for inference in PCIM, based on the idea ...
متن کاملNew Approach to Exponential Stability Analysis and Stabilization for Delayed T-S Fuzzy Markovian Jump Systems
This paper is concerned with delay-dependent exponential stability analysis and stabilization for continuous-time T-S fuzzy Markovian jump systems with mode-dependent time-varying delay. By constructing a novel Lyapunov-Krasovskii functional and utilizing some advanced techniques, less conservative conditions are presented to guarantee the closed-loop system is mean-square exponentially stable....
متن کاملHuman learning in non-Markovian decision making
Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002